% -*- mode: latex; TeX-master: "dataflow"; TeX-command-default: "PDF"; -*-
%% Put local variables at beginning of file because Emacs function
%% `hack-local-variables` only looks backwards 3000 characters for the "Local
%% Variables" block, that number is hardcoded, and the "LocalWords" part of this
%% file is longer than that.


\section{Introduction}

This document describes a \emph{Dataflow Framework} for the Java
programming language.  This is an industrial-strength framework.
The Dataflow Framework is used in the
\href{https://checkerframework.org/}{Checker Framework},
Google's \href{http://errorprone.info/}{Error Prone},
Uber's \href{https://github.com/uber/NullAway}{NullAway},
Meta's \href{https://engineering.fb.com/2022/11/22/developer-tools/meta-java-nullsafe/}{Nullsafe},
and in other contexts.

The primary purpose of the Dataflow Framework is to estimate values:
for each line of source code, it determines properties for
each variable, that are true for every value the variable might contain.

The Dataflow Framework's result (\autoref{sec:analysis_result_class})
is an abstract value for each expression (an estimate of the
expression's run-time value) and a store at each program point.  A
store maps variables and other expressions to abstract
values.

As a pre-pass, the Dataflow Framework transforms an input
AST into a control flow graph (\autoref{sec:cfg}) consisting of basic
blocks made up of nodes representing single operations.  An analysis
operates over the control flow graph.  The effect of a single node on
the dataflow store is represented by a transfer function, which takes
an input store and a node and produces an output store.  Once the
analysis reaches a fixed point, the result can be accessed by client
code.

The \emph{easiest way} to create a dataflow analysis for Java is to use the
Checker Framework.  The Checker Framework performs type checking along with
flow-sensitive type inference, which is equivalent to a dataflow analysis or
an abstract interpretation.
In the Checker Framework, the abstract values to be computed are
annotated types.  You would define a type hierarchy corresponding to the
lattice for your dataflow analysis.
An individual checker can customize its analysis by
extending the abstract value class and by overriding the behavior of
the transfer function for particular node types.
This approach requires you to write very little code.  The downside is
that it will run relatively slowly because of the Checker Framework's rich
functionality.  If you do not need that functionality, you can write a
faster analysis directly on the Dataflow Framework.  Some users first
create an analysis using the Checker Framework, and then if it is not fast
enough, they rewrite it for the Dataflow Framework.

The Dataflow Framework was
designed with several goals in mind.  First, to encourage use
beyond the Checker Framework, it is written as a separate package that can be
built and used with no dependence on the Checker Framework.  Second,
the framework supports analysis but not
transformation, so it provides information that can be used by a type
checker or an IDE, but it does not support optimization.  Third, the
framework aims to minimize the burden on developers who build on top
of it.  In particular, the hierarchy of analysis classes is designed
to reduce the effort required to implement a new flow-sensitive type
checker in the Checker Framework. The
\href{https://docs.google.com/document/d/1oYzbOrrS4ZEEx4wQgIHbijNzcI5CiQAq_-1NrOS8JME/edit?usp=sharing}{Dataflow User's Guide}
gives an introduction to customizing dataflow to add checker-specific
enhancements.

% \begin{workinprogress}
%     Paragraphs colored in gray with a gray bar on the left side (just
%     like this one) contain questions, additional comments or indicate
%     missing parts. Eventually, these paragraphs will be removed.
% \end{workinprogress}


\subsection{Potential version conflicts if you export the Dataflow Framework}

If your tool uses the Dataflow Framework, then \textbf{please shade the
  Dataflow Framework in your tool}.  You can do this using the Maven Shade
Plugin, \url{https://maven.apache.org/plugins/maven-shade-plugin/}, or the
Gradle Shadow Plugin, \url{https://imperceptiblethoughts.com/shadow/}.
If you cannot shade the Checker Framework and Dataflow Framework classes, then
please ask the Checker Framework developers to create a shaded artifact
just for you, every time they release the Checker Framework.
See \code{dataflow/build.gradle} for instructions.

The reason to shade the Dataflow Framework is to permit users to run multiple tools.
Suppose that two tools both utilize the Dataflow Framework.
If the two tools use different versions of the Dataflow Framework,
then it may be impossible for users to run both tools due to a version conflict.
The Checker Framework uses an unshaded version the Dataflow Framework,
since the Dataflow Framework is part of the Checker Framework.
If you do not shade the Dataflow Framework in your tool, then users will
not be able to use both your tool and the Checker Framework (or any other
tool that does not shade the Checker Framework and Dataflow Framework classes).


\section{Organization}

\subsection{Projects}
% TODO: Update

The source code of the combined Checker Framework and Dataflow
Framework is divided into multiple projects: \code{javacutil},
\code{dataflow}, \code{framework}, and \code{checker},
which can be built into distinct jar files.  \code{checker.jar} is a fat
jar that contains all of these, plus the Stub Parser.

\code{javacutil} provides convenient interfaces to routines in
Oracle's javac library.  There are utility classes for interacting
with annotations, elements, trees and types, as well as
\code{InternalUtils}, which gives direct access to internal features
of javac that are not part of a supported interface.  There are
interfaces or abstract classes for reporting errors, for processing
types in an AST, and for providing the annotations present on an
Element.
The \code{org.checkerframework.javacutil.trees} package provides a
class to parse expressions into javac Trees (\code{TreeParser}), a
class to build new Trees from  scratch (\code{TreeBuilder}), and a
class to represent newly introduced variables that are not part of an
input program (\code{DetachedVarSymbol}).

\code{dataflow} contains the classes to represent and construct
control flow graphs and the base classes required for flow analysis.
These classes are described in detail in \autoref{sec:node_classes}.

\code{framework} contains the framework aspects of the Checker
Framework, including the derived classes for flow analysis of
annotated types which are described later in this document.

\code{checker} contains the type system-specific checkers.

The \code{dataflow} project depends only on \code{javacutil}.


\subsection{Classes}

This section gives an overview of the major Java classes and
interfaces in the implementation of the Dataflow Framework and the
flow-sensitive type checking feature of the Checker Framework.  It
includes both the base classes in the \code{dataflow} project and the
derived classes in the \code{framework} project.  The class and
interface declarations are given with full package names to indicate
which project they belong to.

\subsubsection{Nodes}
\label{sec:node_classes}

Dataflow doesn't actually work on trees; it works on Nodes.
A Node class represents an individual operation of a program,
including arithmetic operations, logical operations, method calls,
variable references, array accesses, etc.
Nodes
simplify writing a dataflow analysis by separating the dataflow
analysis from the original source code.
\autoref{tab:nodes} on page~\pageref{tab:nodes} lists
the Node types.

\begin{verbatim}
package org.checkerframework.dataflow.cfg.node;

abstract class Node
class *Node extends Node
\end{verbatim}


\subsubsection{Blocks}
\label{sec:block_classes}

The Block
classes represent basic blocks.

\begin{verbatim}
package org.checkerframework.dataflow.cfg.block;

interface Block
abstract class BlockImpl implements Block
interface SingleSuccessorBlock extends Block
abstract class SingleSuccessorBlockImpl extends BlockImpl implements SingleSuccessorBlock
\end{verbatim}

A RegularBlock contains no exception-raising operations and has a
single control-flow successor.

\begin{verbatim}
package org.checkerframework.dataflow.cfg.block;
interface RegularBlock extends SingleSuccessorBlock
class RegularBlockImpl extends SingleSuccessorBlockImpl implements RegularBlock
\end{verbatim}

An ExceptionBlock contains a single operation that may raise an
exception, with one or more exceptional successors and a single normal
control-flow successor.

\begin{verbatim}
package org.checkerframework.dataflow.cfg.block;
interface ExceptionBlock extends SingleSuccessorBlock
class ExceptionBlockImpl extends SingleSuccessorBlockImpl implements ExceptionBlock
\end{verbatim}

A SpecialBlock represents method entry or exit, including exceptional
exit which is represented separately from normal exit.

\begin{verbatim}
package org.checkerframework.dataflow.cfg.block;
interface SpecialBlock extends SingleSuccessorBlock
class SpecialBlockImpl extends SingleSuccessorBlockImpl implements SpecialBlock
\end{verbatim}

A ConditionalBlock contains no operations at all.  It represents a
control-flow split to either a ``then'' or an ``else'' successor based on
the immediately preceding boolean-valued Node.

\begin{verbatim}
package org.checkerframework.dataflow.cfg.block;
interface ConditionalBlock extends Block
class ConditionalBlockImpl extends BlockImpl implements ConditionalBlock
\end{verbatim}



\subsubsection{ControlFlowGraph}
\label{sec:control_flow_graph_class}

A ControlFlowGraph represents the body of a method or an initializer
expression as a graph of Blocks with distinguished entry, exit, and
exceptional exit SpecialBlocks.  ControlFlowGraphs are produced by the
CFGBuilder classes and are treated as immutable once they are built.

\begin{verbatim}
package org.checkerframework.dataflow.cfg;
class ControlFlowGraph
\end{verbatim}

\subsubsubsection{CFGBuilder}
\label{sec:cfg_builder_classes}

The CFGBuilder classes visit an AST and produce a corresponding
ControlFlowGraph as described in \autoref{sec:ast_to_cfg_translation}.

\begin{verbatim}
package org.checkerframework.dataflow.cfg;
class CFGBuilder
\end{verbatim}

The Checker Framework derives from CFGBuilder in order to desugar
enhanced for loops that make explicit use of type annotations provided
by the checker in use.

\begin{verbatim}
package org.checkerframework.framework.flow;
class CFCFGBuilder extends CFGBuilder
\end{verbatim}

\subsubsubsection{CFGVisualizeLauncher}
\label{sec:cfg_visualize_launcher_class}

The CFGVisualizeLauncher generates a DOT or String representation of
the control flow graph for a given method in a given class.

\begin{verbatim}
package org.checkerframework.dataflow.cfg;
class CFGVisualizeLauncher
\end{verbatim}

\subsubsection{JavaExpressions}
\label{sec:flow_expressions_class}

The Dataflow Framework records the abstract values of certain
expressions, called JavaExpressions: local variables, field accesses,
array accesses, references to \code{this}, and pure method calls.
JavaExpressions are keys in the store of abstract values.

\begin{verbatim}
package org.checkerframework.dataflow.analysis;
class JavaExpressions
\end{verbatim}

Java expressions that appear in method pre- and postconditions are
parsed into JavaExpressions using helper routines in
\code{org.checkerframework.dataflow.expression.JavaExpressionParseUtil}.


\subsubsection{AbstractValue}
\label{sec:abstract_value_classes}

AbstractValue is the internal representation of dataflow information
produced by an analysis.  An AbstractValue is an estimate about the
run-time values that an expression may evaluate to.  The client of the
Dataflow Framework defines the abstract value, so the information may
vary widely among different users of the Dataflow Framework, but they
share a common feature that one can compute the least upper bound of
two AbstractValues.

\begin{verbatim}
package org.checkerframework.dataflow.analysis;
interface AbstractValue<V extends AbstractValue<V>>
\end{verbatim}

For the Checker Framework, abstract values are essentially
AnnotatedTypeMirrors.

\begin{verbatim}
package org.checkerframework.framework.flow;
abstract class CFAbstractValue<V extends CFAbstractValue<V>> implements AbstractValue<V>
class CFValue extends CFAbstractValue<CFValue>
\end{verbatim}

For the Nullness Checker, abstract values additionally track the
meaning of PolyNull, which may be either Nullable or NonNull.  The
meaning of PolyNull can change when a PolyNull value is compared to
the null literal, which is specific to the Nullness Checker.  Other
checkers often also support a Poly* qualifier, but only the
Nullness Checker tracks the meaning of its poly qualifier using the
dataflow analysis.

\begin{verbatim}
package org.checkerframework.checker.nullness;
class NullnessValue extends CFAbstractValue<NullnessValue>
\end{verbatim}

\subsubsection{UnusedAbstractValue}
\label{sec:unused_abstract_value_class}

UnusedAbstractValue is an AbstractValue that is
not used during dataflow analysis. This class should
only be used as a type argument in transfer functions, for
analyses that do not need and do not have their own AbstractValue classes.

For example, LiveVariable analysis (\autoref{sec:live_variable}),
needs to compute the least upper bound of successors' stores, and
it is meaningless to further calculate the least upper bound of two
individual live variables at a smaller granularity.
Since there is no computation between two AbstractValues, LiveVariable
analysis uses the UnusedAbstractValue rather than
implementing a specific AbstractValue for this analysis.

\begin{verbatim}
package org.checkerframework.dataflow.analysis;
public final class UnusedAbstractValue implements AbstractValue<UnusedAbstractValue>
\end{verbatim}

\subsubsection{Store}
\label{sec:store_classes}

A Store is a set of dataflow facts computed by an analysis, so it is a
mapping from JavaExpressions to AbstractValues.  As with
AbstractValues, one can take the least upper bound of two Stores.

\begin{verbatim}
package org.checkerframework.dataflow.analysis;
interface Store<S extends Store<S>>
\end{verbatim}

The Checker Framework store restricts the type of abstract values it
may contain.

\begin{verbatim}
package org.checkerframework.framework.flow;
abstract class CFAbstractStore<V extends CFAbstractValue<V>,
        S extends CFAbstractStore<V, S>>
    implements Store<S>
class CFStore extends CFAbstractStore<CFValue, CFStore>
\end{verbatim}

An InitializationStore tracks which fields of the ``self'' reference
have been initialized.

\begin{verbatim}
package org.checkerframework.checker.initialization;
class InitializationStore<V extends CFAbstractValue<V>,
        S extends InitializationStore<V, S>>
    extends CFAbstractStore<V, S>
\end{verbatim}

A NullnessStore additionally tracks the meaning of PolyNull.

\begin{verbatim}
package org.checkerframework.checker.nullness;
class NullnessStore extends InitializationStore<NullnessValue, NullnessStore>
\end{verbatim}


\subsubsection{Transfer functions}
\label{sec:transfer_functions}

A transfer function (\autoref{sec:transfer_function_classes}) is
explicitly represented as a node visitor that takes a TransferInput
(\autoref{sec:transfer_input_classes}) and produces a TransferResult
(\autoref{sec:transfer_result_classes}).

\subsubsubsection{TransferInput}
\label{sec:transfer_input_classes}

The TransferInput represents the set of dataflow facts known to be
true immediately before the node to be analyzed.  A TransferInput may
contain a single store, or a pair of ``then'' and ``else'' stores when
following a boolean-valued expression.

\begin{verbatim}
package org.checkerframework.dataflow.analysis;
class TransferInput<V extends AbstractValue<V>,
        S extends Store<S>>
\end{verbatim}

\subsubsubsection{TransferResult}
\label{sec:transfer_result_classes}

A TransferResult is the output of a transfer function.  In other
words, it is the set of dataflow facts known to be true immediately
after a node.  A Boolean-valued expression produces a
ConditionalTransferResult that contains both a ``then'' and an ``else''
store, while most other Nodes produce a RegularTransferResult with a
single store.

\begin{verbatim}
package class org.checkerframework.dataflow.analysis;
abstract TransferResult<V extends AbstractValue<V>,
        S extends Store<S>>
class ConditionalTransferResult<V extends AbstractValue<V>,
        S extends Store<S>>
    extends TransferResult<A, S>
class RegularTransferResult<V extends AbstractValue<V>,
        S extends Store<S>>
    extends TransferResult<A, S>
\end{verbatim}

\subsubsubsection{TransferFunction}
\label{sec:transfer_function_classes}

A TransferFunction is a NodeVisitor that takes an input and produces an output.

\begin{verbatim}
package org.checkerframework.dataflow.analysis;
interface TransferFunction<V extends AbstractValue<V>,
        S extends Store<S>>
    extends NodeVisitor<TransferResult<V, S>, TransferInput<V, S>>
interface ForwardTransferFunction<V extends AbstractValue<V>,
        S extends Store<S>>
    extends TransferFunction<V, S>
interface BackwardTransferFunction<V extends AbstractValue<V>,
        S extends Store<S>>
    extends TransferFunction<V, S>
\end{verbatim}

The Checker Framework defines a derived class of TransferFunction to
serve as the default for most checkers.  The class constrains the type
of abstract values and it overrides many node visitor methods to
refine the abstract values in their TransferResults.

\begin{verbatim}
package org.checkerframework.framework.flow;
abstract class CFAbstractTransfer<V extends CFAbstractValue<V>,
        S extends CFAbstractStore<V, S>,
        T extends CFAbstractTransfer<V, S, T>>
    extends AbstractNodeVisitor<TransferResult<V, S>, TransferInput<V, S>>
    implements ForwardTransferFunction<V, S>

class CFTransfer extends CFAbstractTransfer<CFValue, CFStore, CFTransfer>
\end{verbatim}

The Initialization Checker's transfer function tracks which fields of
the ``self'' reference have been initialized.

\begin{verbatim}
package org.checkerframework.checker.initialization;
class InitializationTransfer<V extends CFAbstractValue<V>,
        T extends InitializationTransfer<V, T, S>,
        S extends InitializationStore<V, S>>
    extends CFAbstractTransfer<V, S, T>
\end{verbatim}

The Regex Checker's transfer function overrides visitMethodInvocation
to special-case the \code{isRegex} and \code{asRegex} methods.

\begin{verbatim}
package org.checkerframework.checker.regex;
class RegexTransfer extends CFAbstractTransfer<CFValue, CFStore, RegexTransfer>
\end{verbatim}


\subsubsection{Analysis}
\label{sec:analysis_classes}

An Analysis performs iterative dataflow analysis over a control flow
graph using a given transfer function.  Both forward and backward
analyses are supported.

\begin{verbatim}
package org.checkerframework.dataflow.analysis;
interface Analysis<V extends AbstractValue<V>,
        S extends Store<S>,
        T extends TransferFunction<V, S>>
abstract class AbstractAnalysis<V extends AbstractValue<V>,
        S extends Store<S>,
        T extends TransferFunction<V, S>>
    implements Analysis<V, S, T>
interface ForwardAnalysis<V extends AbstractValue<V>,
        S extends Store<S>,
        T extends ForwardTransferFunction<V, S>>
    extends Analysis<V, S, T>
interface BackwardAnalysis<V extends AbstractValue<V>,
        S extends Store<S>,
        T extends BackwardTransferFunction<V, S>>
    extends Analysis<V, S, T>
\end{verbatim}

The Checker Framework defines a derived class of Analysis for use as
the default analysis of most checkers.  This class adds information
about the type hierarchy being analyzed and acts as a factory for
abstract values, stores, and the transfer function.

\begin{verbatim}
package org.checkerframework.framework.flow;
abstract class CFAbstractAnalysis<V extends CFAbstractValue<V>,
        S extends CFAbstractStore<V, S>,
        T extends CFAbstractTransfer<V, S, T>>
    extends ForwardAnalysisImpl<V, S, T>

class CFAnalysis extends CFAbstractAnalysis<CFValue, CFStore, CFTransfer>
\end{verbatim}

The Nullness Checkers' analysis overrides the factory methods for
abstract values, stores, and the transfer function.

\begin{verbatim}
package org.checkerframework.checker.nullness;
class NullnessAnalysis extends CFAbstractAnalysis<NullnessValue,
        NullnessStore, NullnessTransfer>
\end{verbatim}

The RegexChecker's analysis overrides the factory methods for abstract
values, stores, and the transfer function.

\begin{verbatim}
package org.checkerframework.checker.regex;
class RegexAnalysis extends CFAbstractAnalysis<CFValue, CFStore, RegexTransfer>
\end{verbatim}


\subsubsection{AnalysisResult}
\label{sec:analysis_result_class}

An AnalysisResult preserves the dataflow information computed by an
Analysis for later use by clients.  The information consists of an
AbstractValue for each node in the CFG and a Store that is valid at
the start of each Block.  The AnalysisResult class can return
AbstractValues for either Nodes or Trees and it can re-run the
transfer function to compute Stores that are valid immediately before
or after any Tree.

\begin{verbatim}
package org.checkerframework.dataflow.analysis;
class AnalysisResult<V extends AbstractValue<V>,
        S extends Store<S>>
\end{verbatim}


\subsubsection{AnnotatedTypeFactory}
\label{sec:annotated_type_factory_classes}

AnnotatedTypeFactorys are not part of the Dataflow Framework, per se,
but they are parameterized by the Dataflow Framework classes that they
use.

\begin{verbatim}
package org.checkerframework.framework.type;
class AnnotatedTypeFactory implements AnnotationProvider
\end{verbatim}

In the Checker Framework, dataflow analysis is performed on demand,
one class at a time, the first time that a ClassTree is passed to
getAnnotatedType.  This is implemented in the abstract class
GenericAnnotatedTypeFactory with concrete implementation in
BaseAnnotatedTypeFactory.

\begin{verbatim}
package org.checkerframework.framework.type;
abstract class GenericAnnotatedTypeFactory<Checker extends BaseTypeChecker<?>,
        Value extends CFAbstractValue<Value>,
        Store extends CFAbstractStore<Value, Store>,
        TransferFunction extends CFAbstractTransfer<Value, Store, TransferFunction>,
        FlowAnalysis extends CFAbstractAnalysis<Value, Store, TransferFunction>>
    extends AnnotatedTypeFactory

package org.checkerframework.common.basetype;
class BaseAnnotatedTypeFactory
    extends GenericAnnotatedTypeFactory<CFValue, CFStore, CFTransfer, CFAnalysis>
\end{verbatim}


\section{The Control-Flow Graph}
\label{sec:cfg}

A control-flow graph (CFG) represents a single method or field
initialization.  (The Dataflow Framework performs an intra-procedural
analysis.  This analysis is modular and every method is considered in
isolation.)
This section also describes the translation from the abstract syntax tree
(AST) to the CFG\@.
We start with a simple example, then give a more formal
definition of the CFG and its properties, and finally describe the
translation from the AST to the CFG.

As is standard, a control-flow graph is a set of
basic blocks that are linked by control-flow edges. Possibly less
standard, every basic block consists of a sequence of so-called nodes,
each of which represents a minimal Java operation or expression.


\flow{CFGSimple}{.33}{1.1}{A simple Java code snippet to introduce the CFG.
In CFG visualizations, special basic blocks are shown as ovals;
conditional basic blocks are polygons with eight sides; and regular and exception
basic blocks are rectangles.}

Consider the method \code{test} of \autoref{fig:CFGSimple}. The if
conditional got translated to a \emph{conditional basic block}
(octagon) with two successors. There are also two special basic blocks
(ovals) to denote the entry and exit point of the method.


\subsection{Formal Definition of the Control-Flow Graph}
\label{sec:cfg-formal}

The control-flow graph models all paths that can possibly be taken by an
execution of the method.

\begin{definition}[Control-Flow Graph]
    A \emph{control-flow graph} consists of a set of \emph{basic
      blocks} and a set of directed edges between these basic blocks,
    some of which are labeled.
\end{definition}

\begin{definition}[Basic Block]
    A \emph{basic block} is a sequence of \emph{nodes}, where the only
    control flow between the nodes inside
    the basic block is sequential.  Furthermore, there is no
    control flow occurring between those nodes and nodes of other basic
    blocks, except between the last node of one block $b_1$ and the first node
    of another block $b_2$, if $b_2$ is a successor of $b_1$.  A basic
    block may have multiple successors.
\end{definition}



\begin{definition}[Types of Basic Blocks]
    There are four \emph{types} of basic blocks in a control-flow graph:
    \begin{enumerate}
        \item \textbf{Regular basic block.} A \emph{regular basic
          block} contains any non-empty sequence of nodes and has
          exactly one successor.  None of the nodes in the block can
          throw an exception at run time.

        \item \textbf{Special basic blocks.} A \emph{special basic
          block} contains the empty sequence of nodes (i.e., is empty)
          and denotes either the entry or one of the exit blocks of a
          method. There are three types of special basic blocks:
        \begin{itemize}
            \item Entry block. This basic block is the (only) entry
              point of the method and thus is the only basic block
              without predecessors.
            \item Exit block. This basic block denotes the (normal)
              exit of a method, and it has no successors.
            \item Exceptional exit block, which indicates exceptional
              termination of the method. As an exit block, this block
              has no successors.
        \end{itemize}
        Every method has exactly one entry block, zero or one exit blocks,
        and zero or one exceptional exit blocks. There is always
        either an exit block, an exceptional exit block, or both.

        \item \textbf{Exception basic block.} An \emph{exception basic
          block} contains exactly one node that \emph{might} throw an
          exception at run time (e.g., a method call).  There are zero
          or one non-exceptional successors (only a basic block containing a
          \code{throw} statement does not have a non-exceptional
          successor).  There are one or more
          exceptional successors (see \autoref{def:edges}). In all
          cases there is at least one successor (regular or
          exceptional).

        \item \textbf{Conditional basic block.} A \emph{conditional
          basic block} does not contain any nodes and is used as a
          \emph{split point} after the execution of a node of boolean
          type. It has exactly two successors (both non-exceptional):
          the \emph{then} successor that is reached when the previous node
          evaluates to true and the \emph{else} successor that is reached
          when the previous node evaluates to false.  There is always
          exactly a single predecessor block for every conditional
          basic block, which is either a regular basic block or an
          exception basic block. In both cases, the last node in the
          predecessor will be of boolean type and the boolean value
          controls which successor of the conditional block is
          executed.
    \end{enumerate}
\end{definition}

The Java implementation of the four block types above is described in
\autoref{sec:block_classes}.

\begin{definition}[Control-Flow Graph Edges]
\label{def:edges}
The basic blocks of a control-flow graph are connected by directed
\emph{edges}.  If $b_1$ and $b_2$ are connected by a directed edge
$(b_1,b_2)$, we call $b_1$ a predecessor of $b_2$, and we call $b_2$
a successor of $b_1$.  In a control-flow graph, there are three
types of edges:
\begin{enumerate}
    \item \textbf{Exceptional edges}. An \emph{exceptional edge}
      connects an exception basic block with its exceptional
      successors, and it is labeled by the most general exception that
      might cause execution to take this edge during run time.  Note
      that the outgoing exceptional edges of a basic block do not need
      to have mutually exclusive labels; the semantics is that the
      control flow follows the most specific edge. For instance, if
      one edge is labeled with type \code{A} and another is labeled
      with type \code{B} where \code{B} is a subtype of \code{A}, then
      the execution only takes the first edge if the exception is of a
      subtype of \code{A}, but not a subtype of \code{B}.

\begin{workinprogress}
There is not necessarily a most specific exception type in the program
text; in that case, does the translation add a most specific case that will
never be executed at run time?

In general, what is the relation of the ordering in source code to the
one here?
\end{workinprogress}

    There is at most one successor for every exception type.

    \item \textbf{Conditional edges}. A \emph{conditional edge} is a
      non-exceptional edge that connects a conditional basic block
      with one of its successors, and is labeled with either ``true''
      or ``false''.

    \item \textbf{Regular, non-conditional edge.} Any other edge is a
      \emph{regular edge}, and does not carry a label. Only regular
      basic blocks, the entry basic block, and exception basic blocks
      have outgoing regular edges.
\end{enumerate}
\end{definition}


\begin{definition}[Nodes]
    \label{def:node}
    A \emph{node} is a minimal Java operation or expression.  It is
    minimal in the sense that it cannot be decomposed further into
    subparts between which control flow occurs. Examples for such
    nodes include integer literals, an addition node (which performs
    the mathematical addition of two nodes) or a method call.  Control
    flow such as \code{if} and \code{break} are not represented as
    nodes.  The full list of nodes is given in \autoref{tab:nodes} and
    several of them are described in more detail in
    \autoref{sec:noteworthy-translations}.

    It is important to note that, even though nodes can contain
    references to other nodes, it is only the ``top-level'' node which
    is considered at that point in the basic block. In the example of
    the addition node, this means that only the addition operation is
    to be executed, and its operands would occur earlier in the
    control-flow graph (as they are evaluated first, before performing
    the addition).
\end{definition}

In the visualization, a string representation of the node is used,
followed by the node type in square brackets. Note that the string
representation often also includes more than just the ``top-level''
node. For instance, the addition expression \code{a + b[0];} will
appear as ``a + b[0] [ NumericalAddition ]'' rather than ``a'' plus
some temporary variable.  This is done for clarity, so that it is easy
to see what expressions are summed up and because we don't create
internal names for expression results.

\autoref{tab:nodes} lists all node types in the framework.
All classes are in package \code{org.checkerframework.dataflow.cfg.node}.

% Defining a command for node types makes it easier to determine all the
% node types that the manual discusses.  One can compare this list with the
% source code to ensure that every node type is listed.
\newcommand{\nodetype}[1]{\code{#1}}

    \begin{longtable}{lp{0.4\linewidth}l}
        \midrule
        \multicolumn{3}{c}{\autoref{tab:nodes}: All node types in the Dataflow Framework.} \\ \\
        \textbf{Node type} & \textbf{Notes} & \textbf{Example} \\ \midrule \endfirsthead

        \textbf{Node type} & \textbf{Notes} & \textbf{Example} \\ \midrule \endhead
        \hline \multicolumn{3}{|c|}{{Continued on next page}} \\ \hline \endfoot
        \endlastfoot

        \nodetype{Node} & The base class of all nodes & \\
        \midrule

        \nodetype{ValueLiteralNode} & The base class of literal value nodes & \\
        \nodetype{BooleanLiteralNode} & & \code{true} \\
        \nodetype{CharacterLiteralNode} & & \code{'c'} \\
        \nodetype{DoubleLiteralNode} & & \code{3.14159} \\
        \nodetype{FloatLiteralNode} & & \code{1.414f} \\
        \nodetype{IntegerLiteralNode} & & \code{42} \\
        \nodetype{LongLiteralNode} & & \code{1024L} \\
        \nodetype{NullLiteralNode} & & \code{null} \\
        \nodetype{ShortLiteralNode} & & \code{512} \\
        \nodetype{StringLiteralNode} & & \code{"memo"} \\
        \midrule

        & Accessor expressions & \\
        \nodetype{ArrayAccessNode} & & \code{args[i]} \\
        \nodetype{FieldAccessNode} & & \code{f}, \code{obj.f} \\
        \nodetype{MethodAccessNode} & & \code{obj.hashCode} \\
        \nodetype{LocalVariableNode} & Use of a local variable, either as l-value or r-value & \\
        \nodetype{ThisNode} & Base class of references to \code{this} & \\
        \nodetype{ExplicitThisNode} & Explicit use of \code{this} in an expression & \\
        \nodetype{ImplicitThisNode} & Implicit use of \code{this} in an expression & \\
        \nodetype{SuperNode} & Explicit use of \code{super} in expression & \code{super(x, y)} \\
        \midrule

        \nodetype{MethodInvocationNode} & Note that access and invocation are distinct & \code{hashCode()} \\
        \midrule

        & Expression supertypes & \\
        \nodetype{BinaryOperationNode} & Base class of binary expression nodes & \\
        \nodetype{UnaryOperationNode} & Base class of unary expression nodes & \\

        & Arithmetic and logical operations & \\
        \nodetype{BitwiseAndNode} & & a \& \code{b} \\
        \nodetype{BitwiseComplementNode} & & \verb|~b| \\
        \nodetype{BitwiseOrNode} & & \code{a | b} \\
        \nodetype{BitwiseXorNode} & & \code{a ^ b} \\
        \nodetype{ConditionalAndNode} & Short-circuiting & a \&\& \code{b} \\
        \nodetype{ConditionalNotNode} & & \code{!a} \\
        \nodetype{ConditionalOrNode} & Short-circuiting & \code{a || b} \\
        \nodetype{FloatingDivisionNode} & & \code{1.0 / 2.0} \\
        \nodetype{FloatingRemainderNode} & & \code{13.0 \% 4.0} \\
        \nodetype{LeftShiftNode} & & \code{x << 3} \\
        \nodetype{IntegerDivisionNode} & & \code{3 / 2} \\
        \nodetype{IntegerRemainderNode} & & \code{13 \% 4} \\
        \nodetype{NumericalAdditionNode} & & \code{x + y} \\
        \nodetype{NumericalMinusNode} & & \code{-x} \\
        \nodetype{NumericalMultiplicationNode} & & \code{x * y} \\
        \nodetype{NumericalPlusNode} & & \code{+x} \\
        \nodetype{NumericalSubtractionNode} & & \code{x - y} \\
        \nodetype{SignedRightShiftNode} & & \code{x >> 3} \\
        \nodetype{StringConcatenateNode} & & \code{s + ".txt"} \\
        \nodetype{SwitchExpressionNode} & & \\
        \nodetype{TernaryExpressionNode} & & \code{c ? t : f} \\
        \nodetype{UnsignedRightShiftNode} & & \code{x >>> 5} \\
        \midrule

        & Relational operations & \\
        \nodetype{EqualToNode} & & \code{x == y} \\
        \nodetype{NotEqualNode} & & \code{x != y} \\
        \nodetype{GreaterThanNode} & & \code{x > y} \\
        \nodetype{GreaterThanOrEqualNode} & & \code{x >= y} \\
        \nodetype{LessThanNode} & & \code{x < y} \\
        \nodetype{LessThanOrEqualNode} & & \code{x <= y} \\
        \midrule

        \code{SwitchExpression} & A switch expression (not statement!) & \\

        \nodetype{CaseNode} & Case of a switch.  Acts as an equality test. & \\
        \midrule

        \nodetype{AssignmentNode} & & \code{x = 1} \\
        \midrule

        \nodetype{ArrayCreationNode} & & \code{new double[]} \\
        \nodetype{ObjectCreationNode} & & \code{new Object()} \\
        \midrule

        \nodetype{FunctionalInterfaceNode} & A member reference or lambda & \\
        \midrule

        \nodetype{TypeCastNode} & & \code{(float) 42} \\
        \nodetype{InstanceOfNode} & & \code{x instanceof Float} \\
        \midrule

        & Conversion nodes & \\
        \nodetype{NarrowingConversionNode} & Implicit conversion & \\
        \nodetype{StringConversionNode} & Might be implicit & \code{obj.toString()} \\
        \nodetype{WideningConversionNode} & Implicit conversion & \\
        \midrule

        \midrule
        & \textbf{Non-value nodes} & \\

        & Types appearing in expressions, such as \code{MyType.class} & \\
        \nodetype{ArrayTypeNode} & & \\
        \nodetype{ParameterizedTypeNode} & & \\
        \nodetype{PrimitiveTypeNode} & & \\
        \midrule

        \nodetype{ClassNameNode} & Identifier referring to Java class or interface & \code{java.util.HashMap} \\
        \nodetype{PackageNameNode} & Identifier referring to Java package & \code{java.util} \\
        \midrule

        \nodetype{ThrowNode} & Throw an exception & \\
        \nodetype{ReturnNode} & Return from a method & \\
        \midrule

        \nodetype{AssertionErrorNode} & & \code{assert x != null : "Hey"} \\
        \midrule

        \nodetype{ExpressionStatementNode} & An expression that is used as a statement & \code{m();} \\
        \midrule

        \nodetype{SynchronizedNode} & Start or end of a synchronized code block & \\
        \nodetype{MarkerNode} & No-op node used to annotate a CFG with
        information of the underlying Java source code.  Mostly useful
        for debugging and visualization. An example is indicating the
        start/end of switch statements. & \\
        \midrule

        \nodetype{NullChkNode} & Null checks inserted by javac & \\
        \midrule

        \nodetype{VariableDeclarationNode} & Declaration of a local variable & \\
        \nodetype{ClassDeclarationNode} & Declaration of a class & \\
        \midrule

        \nodetype{LambdaResultExpressionNode} & Body of a single-expression lambda & \\
        \midrule

        \caption{All node types in the Dataflow Framework.}
        \label{tab:nodes}
    \end{longtable}


In theory, nearly any statement can throw an \code{Error} such as
\code{OutOfMemoryError} or \code{NoSuchFieldError}.  The Dataflow Framework
does not represent all that possible flow.  It only creates the exceptional
edges shown in \autoref{tab:nodesWithException}.

\begin{table}
    \begin{tabular}{ll}
        \hline
        \textbf{Node type} & \textbf{Exception type} \\  \hline

        \code{ArrayAccessNode} & \code{NullPointerException}, \code{ArrayIndexOutOfBoundsException} \\
        \code{FieldAccessNode} & \code{NullPointerException} \\
        \code{MethodAccessNode} & \code{NullPointerException} \\
        \code{MethodInvocationNode} & \code{Throwable}, types in throws clause of the signature \\
        \code{IntegerDivisionNode} & \code{ArithmeticException} \\
        \code{IntegerRemainderNode} & \code{ArithmeticException} \\
        \code{ObjectCreationNode} & \code{Throwable}, types in throws clause of the signature \\
        \code{ArrayCreationNode} & \code{NegativeArraySizeException}, \code{OutOfMemoryError} \\
        \code{TypeCastNode} & \code{ClassCastException} \\
        \code{ThrowNode} & Type of \code{e} when \code{throw e} \\
        \code{AssertionErrorNode} & \code{AssertionError} \\
        \code{ClassNameNode} & \code{ClassCircularityError}, \code{ClassFormatError}, \\
        & \code{NoClassDefFoundError}, \code{OutOfMemoryError} \\
        \hline
    \end{tabular}

        \caption{All node types that could throw Exception, and the types
          to be thrown.
          All exception types are in package \code{java.lang}.}
        \label{tab:nodesWithException}
\end{table}

Other AST constructs are desugared into the above nodes (see \autoref{sec:desugaring}).

% \begin{workinprogress}
% Can \code{StringConversionNode} be implicit?  I think so, but in any event discuss.
% \end{workinprogress}


\subsection{Noteworthy Translations and Node Types}
\label{sec:noteworthy-translations}

In this section we mention any non-straightforward translations from the AST to
the CFG, or special properties about individual nodes.


\subsubsection{Program Structure}
\label{sec:prog-structure}

Java programs are structured using high-level programming constructs
such as loops, if-then-else constructs,
try-catch-finally blocks or switch statements.  During the translation
from the AST to the CFG some of this program structure is lost and all
non-sequential control flow is represented by two low-level
constructs: conditional basic blocks and control-flow edges between
basic blocks. For instance, a \code{while} loop is translated into its
condition followed by a conditional basic block that models the two
possible outcomes of the condition: either the control flow follows
the ``true'' branch and continues with the loop's body, or control goes to the
``false'' successor and executes the first statement after the loop.


\subsubsection{Assignment}

As described in \jlsref{15.26.1}, the execution of an assignment is in
general not strictly left-to-right. Rather, the right-hand side might
be evaluated even if the left-hand side of the assignment causes an
exception. This semantics is faithfully represented in the CFG
produced by the translation.  An example of a field assignment
exhibiting this behavior is shown in \autoref{fig:CFGFieldAssignment}.

\flow{CFGFieldAssignment}{.33}{1}{Control flow for a field assignment is not strictly
left-to-right (cf.\ \jlsref{15.26.1}),
which is properly handled by the translation.}

\subsubsection{Postfix/Prefix Increment/Decrement}
\label{sec:postpre-incdec}
Postfix and prefix increment and decrement have a side effect to
update the variable or field. To represent this side effect, the Dataflow
Framework creates an artificial assignment node like \code{n = n + 1}
for \code{++n} or \code{n++}. This artificial assignment node is stored
in \code{unaryAssignNodeLookup} of \code{ControlFlowGraph}. The assignment
node is also stored in \code{treeLookup} for prefix increment or decrement
so that the result of it is after the assignment. However, the node before
the assignment is stored in \code{treeLookup} for postfix increment or decrement
because the result of it should be before the assignment. For further information
about node-tree mapping, see \autoref{sec:conversions}.

\subsubsection{Conditional stores}
\label{sec:cond-stores}

The Dataflow Framework extracts information from control-flow splits
that occur in \code{if}, \code{for}, \code{while}, and \code{switch}
statements and in conditional operator expressions.  In order to have
the information available at the split, we eagerly produce two stores
contained in a \code{ConditionalTransferResult} after certain
boolean-valued expressions.  The stores are called the \emph{then} and
\emph{else} stores.  So, for example, after the expression \code{x ==
  null}, two different stores will be created.  The Nullness Checker
would produce a then store that maps \code{x} to @Nullable and an else
store that maps \code{x} to @NonNull.

The Dataflow Framework allows a maximum of two stores and when there
are two distinct stores, they always refer to the most recent
boolean-valued expression.  Stores are propagated through most nodes
and they are reversed for conditional not expressions.  The transfer
functions for many nodes merge conditional stores back together
because they cannot maintain the distinction between them.  Merging
just means taking the least upper bound of the two stores and it
happens automatically by calling \code{TransferInput.getRegularStore}.

Assignments usually merge conditional stores. For the translation of
conditional operator expressions, a special non-merging assignment
node is used (see \autoref{sec:cond-exp} below).


\subsubsection{Branches}

The control flow graph represents all non-exceptional control-flow
splits, or branches, as \code{ConditionalBlock}s that contain no
nodes.  If there is one store flowing into a conditional block, then
it is duplicated to both successors.  If there are two stores flowing
into a conditional block, the then store is propagated to the block's
then successor and the else store is propagated to the block's else
successor.

Consider the control flow graph generated for the simple if statement
in \autoref{fig:CFGIfStatement}.  The conditional expression \code{b1}
immediately precedes the \code{ConditionalBlock}, represented by the
octagonal node.  The \code{ConditionalBlock} is followed by both a
then and an else successor block, after which control flow merges back
together at the exit block.  The edge labels \code{EACH_TO_EACH},
\code{THEN_TO_BOTH}, and \code{ELSE_TO_BOTH} are flow rules described
in \autoref{sec:flow-rules}.  As described above, the then store
propagates to (both stores of) the block's then successor according to
rule \code{THEN_TO_BOTH} and the else store propagates to (both stores
of) the block's else successor according to rule \code{ELSE_TO_BOTH}.
More precise rules are used to preserve dataflow information for
short-circuiting expressions, as described in \autoref{sec:cond-exp}.

\flow{CFGIfStatement}{.33}{1.25}{Example of an if statement translated into a
  \code{ConditionalBlock}.}

\subsubsection{Conditional Expressions}
\label{sec:cond-exp}

The conditional and (\code{&&}, cf. \jlsref{15.23}) and the
conditional or (\code{||}, cf. \jlsref{15.24}) expressions are subject
to short-circuiting: if evaluating the left-hand side already
determines the result, then the right-hand side is not evaluated. This
semantics is faithfully represented in the constructed CFG and more
precise flow rules (\autoref{sec:flow-rules}) are used to preserve
additional dataflow information.

An example program using conditional or is shown in
\autoref{fig:CFGConditionalOr}.  Note that the CFG correctly
represents short-circuiting.  The expression \code{b2 || b3} is only
executed if \code{b1} is false and \code{b3} is only evaluated if
\code{b1} and \code{b2} are false.

Observe in \autoref{fig:CFGConditionalOr} that the flow rule between
the first conditional block and its then successor is
\code{THEN_TO_THEN}, rather than the default flow rule for such edges
\code{THEN_TO_BOTH}, which is present on the edge from the last
conditional block to its then successor.  \code{THEN_TO_THEN} is a
more precise rule which propagates the then store from the predecessor
of the conditional block to the then store of the then successor and
leaves the else store of the successor untouched.  This is a valid
rule for propagating information along the short-circuit edge of a
conditional or expression because \code{b1 || (b2 || b3)} being false
implies that \code{b1} is false, so dataflow information that obtains
when \code{b1} is true has no effect on the dataflow information
obtains when \code{b1 || (b2 || b3)} is false.  To put it another way,
if control reaches the block containing \code{b1 || (b2 || b3)} and
that expression is false, then control must have flowed along the else
branches of both conditional blocks and only the facts that obtain
along those edges need to be kept in the else store of the block
containing \code{b1 || (b2 || b3)}.

\flow{CFGConditionalOr}{.33}{1.33}{Example of a conditional or expression
  (\code{||}) with short-circuiting and more precise flow rules.}

Conditional operator expressions (using the ternary conditional operator
\code{? :}, cf. \jlsref{15.25}) use an additional synthetic local
variable in the CFG. The second (then) and third (else) expressions are
assigned to this local variable. In contrast to other assignments, this
assignment does not merge the stores, to allow the propagation of
conditional stores from these sub-expressions.
The \code{TernaryExpressionNode} at the merge of the two branches can
then either be treated like a local variable read or return a
conditional store.

\flow{CFGConditionalOpExpr}{.33}{.33}{Example of a conditional operator
  expression (\code{? :}).}


\subsubsection{Implicit \code{this} access}

The Java compiler AST uses the same type (\code{IdentifierTree}) for
local variables and implicit field accesses (where \code{this.} is
left out).  To relieve the user of the Dataflow Framework from
manually determining the two cases, we consistently use
\code{FieldAccessNode} for field accesses, where the receiver might be
an \code{ImplicitThisNode}.


\subsubsection{Assert statements}
\label{sec:assert-stmts}

Assert statements are treated specially by the CFG builder because it
is unknown at CFG construction time whether or not assertions will be
enabled when the program is run.  When assertions are enabled, the
dataflow information gained by analyzing the assert statement can
improve precision and allow the programmer to avoid redundant
annotations.  However, when assertions are disabled, it would be
unsound to assume that they had any effect on dataflow information.

The user of the Dataflow Framework may specify that
assertions are enabled or disabled.  When assertions are assumed to be
disabled, no CFG Nodes are built for the assert statement.
When assertions are assumed to be enabled, CFG Nodes are built to
represent the condition of the assert statement and, in the else
successor of a ConditionalBlock, CFG Nodes are built to represent the
detail expression of the assert, if any.

If assertions are not assumed to be enabled or disabled, then
the CFG is conservative and represents the fact that the
assert statement may execute or may not.  This takes the form of a
ConditionalBlock that branches on a fake variable.  For example, see
\autoref{fig:CFGAssert}.  The fake variable named
\code{assertionsEnabled#num0} controls the first ConditionalBlock.
The then successor of the ConditionalBlock is the same subgraph of CFG
Nodes that would be created if assertions were assumed to be enabled,
while the else successor of the ConditionalBlock is the same, empty,
subgraph of CFG Nodes that would be created if assertions were assumed
to be disabled.

\flow{CFGAssert}{.15}{2.9}{Example of an assert statement translated with
  assertions neither assumed to be enabled nor assumed to be
  disabled.}


\subsubsection{Varargs method invocation}
\label{sec:varargs}

In Java, varargs in method or constructor invocation is compiled
as new array creation (cf.\ \jlsref{15.12.4.2}). For example,
\code{m(1, 2, 3)} will be compiled as \code{m(new int[]\{1, 2, 3\})}
when the signature of \code{m} is \code{m(int... args)}. Dataflow
Framework creates an \code{ArrayCreationNode} with initializer for varargs
in the same way as the Java compiler does.
Note that it doesn't create an \code{ArrayCreationNode}
when the varargs is an array with the same depth as the type of
the formal parameter, or if \code{null} is given as the actual varargs argument.

\subsubsection{Default case and fall through for switch statement}
\label{sec:default-switch}

A switch statement is handled as a chain of \code{CaseNode} and nodes in
the case. \code{CaseNode} makes a branch by comparing the equality of
the expression of the switch statement and the expression of the case.
Note that the expression of a switch statement must be executed only
once at the beginning of the switch statement. To refer to its value, a fake variable
is created and it is assigned to a fake variable. A \code{THEN_TO_BOTH}
edge goes to nodes in the case and a \code{ELSE_TO_BOTH} edge goes to the next
\code{CaseNode}. When the next case is the default case, it goes to nodes in the default
case. If a break statement is in nodes, it creates an edge to next node of
the switch statement. If there is any possibility of fall-through, an edge
to the first node in the next case is created after nodes in the case.
For example, see \autoref{fig:CFGSwitch}. The fake variable named \code{switch#num0}
is created and each case node creates the branches.

\flow{CFGSwitch}{.21}{1.45}{Example of a switch statement with case, default and fall through.}

\subsubsection{Switch expressions}
\label{sec:switch-expr}

A switch expression is translated similarly to a switch statement.
An additional fake variable \code{switchExpr} is used to store the value
from the cases. The \code{SwitchExpressionNode} itself can then be
handled like a read of that variable.

% TODO: add an example

\subsubsection{Handling \code{finally} blocks}
\label{sec:try-finally}

Control flow statements, like \code{return}, \code{break}, and \code{continue},
within \code{try} blocks will cause execution of the \code{finally} block before
continuing at the target of the jump. The Dataflow Framework models this
behavior by adding a jump to a duplicate of the finally block before the
jump to the original target of the control flow statement.


\subsection{AST to CFG Translation}
\label{sec:ast_to_cfg_translation}

This section gives a high-level overview of the translation process
from the abstract syntax tree to the control-flow graph as described
in \autoref{sec:cfg-formal}.

First, we define several entities, which will be used in the translation.

\begin{definition}[Extended Node]
    In the translation process the data type \emph{extended node} is used.
    An extended node can be one of four possibilities:
    \begin{itemize}
        \item \textbf{Simple extended node.} An extended node can just be a
          wrapper for a node that does not throw an exception,
          as defined in Definition~\ref{def:node}.
        \item \textbf{Exception extended node.} Similar to a simple
          node, an exception extended node contains a node, but this
          node might throw an exception at run time.
        \item \textbf{Unconditional jump.} An unconditional jump
          indicates that control flow proceeds non-sequentially to a
          location indicated by a target label.
        \item \textbf{Conditional jump.} A conditional jump can follow
          an extended node that contains a node of boolean type. It
          contains two target labels, one if the node evaluates to
          true and one for false.
    \end{itemize}
\end{definition}

\textbf{Comparison of nodes and extended nodes.}
Nodes themselves never contain control flow information; they only
represent computation.

An extended node is a wrapper around a node that represents control flow
information.  It contains:  a node, a label, a predecessor, and a
successor.

An extended node is temporarily used to keep track of some control flow
information.  Later, the basic block data structures are created, and they
represent the control flow.  (And at that point the extended nodes are
discarded.)
\begin{definition}[Label]
    A \emph{label} is a marker that is used to refer to extended
    nodes. It is used only temporarily during CFG construction.
\end{definition}


The process of translating an AST to a CFG proceeds in three distinct phases.
\begin{enumerate}
    \item \textbf{Phase one.} In the first phase, a single linear
      sequence of extended nodes is created. The control flow is
      implicitly assumed to be sequential through the sequence of
      extended nodes, until a (conditional or unconditional) jump is
      encountered in an if, for, while, or switch statement, in which
      case the jump target decides where execution proceeds.

    The labels used as targets of jumps are associated with positions
    in this sequence and are managed by maintaining a binding function
    from labels to sequence positions. The advantage of having this
    indirection is that one can create a label and associate with the
    next free position in the sequence, without knowing which exact
    extended node will be placed there.  Furthermore, labels can be
    created and used before they are actually bound to their correct
    position in the sequence (e.g., when that position is not yet
    known).  At the end, the binding function can be used to resolve
    labels to extended nodes.

    Furthermore, phase one also computes a mapping from AST tree
    elements to nodes, as well as a set of leaders. A \emph{leader} is
    an extended node for which one of the following conditions
    applies:
    \begin{itemize}
    \item It is the first extended node in the sequence.
    \item It is the target of a jump (i.e. there is a label bound to
      the location of the node in the sequence).
    \item It is the first node following a jump.
    \end{itemize}

    \item \textbf{Phase two.} Phase two translates the linear
      representation to a control-flow graph by performing the
      following transformations:
    \begin{itemize}
        \item Simple extended nodes are translated to regular basic
          blocks, where multiple nodes can be grouped in one regular
          basic block.
        \item Exception extended nodes are translated to exception
          basic blocks with the correct edges.
        \item Unconditional jumps are replaced with edges between the
          correct basic blocks.
        \item Conditional jumps are replaced by a conditional basic
          block.
    \end{itemize}
    To greatly simplify the implementation, phase two is allowed to
    produce a degenerated control-flow graph. In particular, the
    following deficiencies are possible:
    \begin{itemize}
    \item Regular basic blocks might be empty.
    \item Some conditional basic blocks might be unnecessary, in that
      they have the same target for both the ``then'' as well as the
      ``else'' branch.
    \item Two consecutive, non-empty, regular basic blocks can exist,
      even if the second block has only exactly one predecessor and
      the two blocks could thus be merged.
    \end{itemize}
    \item \textbf{Phase three.} In the third and last phase, the
      control-flow graph is transformed such that the deficiencies
      remaining from phase two are removed. It is ensured that
      removing one kind of deficiency does not create another
      degenerate case.
\end{enumerate}



\subsubsection{Desugaring}
\label{sec:desugaring}

Desugaring means replacing complicated source language constructs by
simpler ones, or removing syntactic sugar from an input program.
Originally, we intended for the control flow graph representation to
be as close as possible to the Java abstract syntax tree to simplify
the mapping from tree to CFG node and back and to reuse existing
checker code written in terms of trees.  However, we ran into several
cases that were better handled by desugaring.

    \begin{itemize}
    \item We decided to represent implicit conversion operations like
      boxing, unboxing, widening, and narrowing as explicit CFG nodes
      because they change the type of a value.  For example, implicit
      unboxing of an \code{Integer} will be translated into a call to
      \code{Integer.intValue}.  The pre-conversion type can be
      associated with the original node and the post-conversion type
      can be associated with the explicit conversion node.  It also
      makes it possible for the transfer function to operate on the
      conversion nodes.

    \item Enhanced for loops are defined in terms of a complicated
      translation into simpler operations, including field accesses,
      branches, and method calls that could affect dataflow
      information.  It would be prohibitively difficult for a checker
      writer to write a transfer function that correctly accounted for
      all of those operations, so we desugar enhanced for loops.

    \item Once we decided to make conversion nodes explicit it made
      sense to desugar compound assignments.  A compound assignment
      like \begin{verbatim}Integer i; i += 3;\end{verbatim} performs
      both an unboxing and a boxing operation on \code{i}.  Desugaring
      all compound assignments greatly reduced the total number of
      node classes.

    \item String concatenation assignments are also desugared. A string
      concatenation assignment like
      \begin{verbatim}String s; s += "str";\end{verbatim} is represented as
      \begin{verbatim}String s; s = s + "str";\end{verbatim} in CFGs.
      This avoids duplicating the same logic that is necessary in assignment
      and concatenation nodes, which was error prone.

    \end{itemize}

In order to desugar code and still maintain the invariant that every
CFG node maps to a tree, we needed to create new AST tree nodes that
were not present in the input program.  Javac allows us to do this
through non-supported APIs and we wrote some utility classes in
\code{javacutil} to make the process easier.  The new trees are
created during CFG building and they persist as long as some CFG node
refers to them.  However, the trees are not inserted into the AST, so
they are not type-checked or seen by other tree visitors.  Their main
purpose is to carry Java types and to satisfy AnnotatedTypeFactory
methods.

A further complication is that these newly-introduced AST trees are
not part of the TreePath when visiting the AST.  We work around this
problem by giving the AnnotatedTypeFactory a mapping, called the
\code{pathHack}, from newly-introduced trees to their containing
MethodTree and ClassTree.

Possibly even worse, we needed to create fake symbols for variables
created when desugaring enhanced for loops.  Javac does not expose the
ability to create a symbol, so we created a new subclass of
Symbol.VarSymbol called \code{javacutil.tree.DetachedVarSymbol} for
this purpose.  AnnotatedTypeFactory explicitly checks for
DetachedVarSymbols in its DeclarationFromElement method.



\subsubsection{Conversions and node-tree mapping}
\label{sec:conversions}

As mentioned in \autoref{sec:desugaring}, we represent implicit Java
type conversions such as boxing, unboxing, widening, and narrowing by
explicit CFG nodes.  This means that some AST tree nodes correspond to
multiple CFG nodes: a pre-conversion node and a post-conversion node.
We will describe how the conversions work and how the node-tree
mappings are implemented.

Boxing and unboxing are represented in terms of calls to Java standard
library methods.  Boxing corresponds to a call to
\code{BoxedClass.valueOf} while unboxing corresponds to a call to
\code{BoxedClass.\*Value}.  This allows annotations on the library
methods, as well as transfer functions for method invocations, to
apply to the conversions with no special work on the part of a checker
developer.

Widening and narrowing conversions are represented as special
node types, although it would be more consistent to change them into
type casts.

We maintain the invariant that a CFG node maps to zero or one AST tree
and almost all of them map to a single tree.  But we can't maintain a
unique inverse mapping because some trees have both pre- and
post-conversion nodes.  Instead, we remember two mappings, one from
tree to pre-conversion node and, for those trees that were converted,
one from tree to post-conversion node.  Both the CFGBuilder and the
ControlFlowGraph store two separate mappings.  The Analysis class
explicitly stores the tree to pre-conversion node mapping as
\code{treeLookup} and it indirectly uses the tree to post-conversion
mapping in \code{Analysis.getValue(Tree)}.  This has effectively
hidden the distinction between pre and post-conversion nodes from the
Checker Framework, but in the long run it may be necessary to expose
it.



\section{Dataflow Analysis}

This section describes how the dataflow analysis over the control-flow
graph is performed and how to implement a particular analysis.

Roughly, a dataflow analysis in the framework works as follows. Given
the abstract syntax tree of a method, the framework computes the
corresponding control-flow graph as described in
\autoref{sec:cfg}. Then, a simple forward or backward iterative
algorithm is used to compute a fix-point, by iteratively applying a
set of transfer functions to the nodes in the CFG\@.  These transfer
functions are specific to the particular analysis and are used to
approximate the run-time behavior of different statements and expressions.


\subsection{Managing Intermediate Results of the Analysis}
\label{sec:node-mapping}
\label{sec:store-management}


Conceptually, the dataflow analysis computes an abstract value for
every node and flow expression\footnote{Certain dataflow analyses
  might choose not to produce an abstract value for every node.  For
  instance, a constant propagation analysis would only be concerned
  with nodes of a numerical type, and could ignore other nodes.}.  The
transfer function (\autoref{sec:transfer-fnc}) produces these abstract
values, taking as input the abstract values computed earlier for
sub-expressions.  For instance, in a constant propagation analysis,
the transfer function for addition (\code{+}) would look at the
abstract values for the left and right operand, and determine that the
\code{AdditionNode} is a constant if and only if both operands are
constant.

An analysis result contains two parts:

\begin{enumerate}
\item
The \emph{node-value mapping} (\code{Analysis.nodeValues}) maps \code{Node}s to their abstract
values.  Only nodes that can take on an abstract value are
used as keys.  For example, in the Checker Framework, the mapping is
from expression nodes to annotated types.

The framework consciously does not store the abstract value
directly in the node, to remove any coupling between the control-flow
graph and a particular analysis.  This allows the control-flow graph
to be constructed only once, and then reused for different dataflow
analyses.

\item
A set of \emph{stores}.  Each store maps a flow expression to an
abstract value.  Each store is associated with a specific program point.

The stores tracked by an analysis implement the \code{Store}
interface, which defines the following operations:
\begin{itemize}
\item Least upper bound: Compute the least upper bound of two stores
  (e.g., at a merge-point in the control-flow graph).
\item Equivalence: Compare two stores if they are (semantically)
  different, which is used to determine if a fix-point is reached in
  the dataflow analysis. Note that reference-equality is most likely
  not sufficient.
\item Copy mechanism: Clone a store to get an exact copy.
\end{itemize}
The store is analysis-dependent, but the framework provides a default
store implementation which can be reused.  The default implementation
is
\begin{verbatim}org.checkerframework.framework.flow.CFStore\end{verbatim}

What information is tracked in the store depends on the analysis to be
 performed.  Some examples of stores include
\begin{verbatim}
org.checkerframework.checker.initialization.InitializationStore
org.checkerframework.checker.nullness.NullnessStore
\end{verbatim}

Every store is associated with a particular point in the control-flow
graph, and all stores are managed by the framework. It saves an explicit store
for the start of each basic block.
When dataflow information
is requested for a later point in a block, the analysis applies the
transfer function to recompute it from the initial store.

\end{enumerate}

After an analysis has iterated to a fix-point, the computed dataflow
information is maintained in an AnalysisResult, which can map either
nodes or trees to abstract values.


\subsection{Answering Questions}
\label{sec:answering-questions}
After the flow analysis for a particular method has been computed,
there are two kinds of information that have been computed.  Firstly,
the node-value mapping stores an abstract value for every node, and
secondly, the information maintained in various stores is available.

Two kinds of queries are possible to the dataflow analysis after the
analysis is complete:
\begin{enumerate}
    \item For a given AST tree node, what is its abstract value?  Both
      pre- and postconversion values can be retrieved.
      A discussion of conversions can be found in \autoref{sec:conversions}.
    \item For a given AST tree node, what is the state right after
      this AST tree node?  Examples of questions include:
    \begin{itemize}
        \item Which locks are currently held?
        \item Are all fields of a given object initialized?
    \end{itemize}
\end{enumerate}

The store may first need to be (re-)computed, as the framework does not
store all intermediate stores but rather only those for key positions
as described in \autoref{sec:store-management}.

To support both kinds of queries, the framework builds a map from AST
tree nodes (of type \code{com.sun.source.tree.Tree}) to CFG nodes.  To
answer questions of the first type it is then possible to go from the
AST tree node to the CFG node and look up its abstract value in the
node-value mapping (this is provided by the framework).  By default,
the abstract value returned for a tree by
\code{Analysis.getValue(Tree)} includes any implicit conversions
because it uses the mapping from tree node to post-conversion CFG
node.  To request the pre-conversion value, one currently uses the
\code{ControlFlowGraph.treelookup} map directly.

To support questions of the second kind, every node has a reference to
the basic block it is part of. Thus, for a given AST tree node, the
framework can determine the CFG node and thereby the CFG basic block,
and compute the necessary store to answer the question.


\subsection{Transfer Function}
\label{sec:transfer-fnc}

A transfer function is an object that has a transfer method for
every \code{Node} type, and also a transfer method for procedure entry.

\begin{itemize}
\item A transfer method for a \code{Node} type takes a store
  and the node, and produces an updated store. This is achieved by
  implementing the \code{NodeVisitor<S, S>} interface for the store
  type \code{S}.

  These transfer methods also get access to the abstract value of any
  sub-node of the node \code n under consideration.  This is not limited
  to immediate children, but the abstract value for any node contained
  in \code n can be queried.

\item A transfer method for procedure entry returns the initial store, given the
  list of parameters (as \code{LocalVariableNode}s that represent the formal
  parameters) and the
  \code{MethodTree} (useful if the initial store depends on the procedure
  signature, for instance).

\end{itemize}


\subsection{Flow Rules}
\label{sec:flow-rules}

As mentioned in \autoref{sec:cond-stores}, dataflow analysis
conceptually maintains two stores for each program point, a then store
containing information valid when the previous boolean-valued
expression was true and an else store containing information valid
when the expression was false.  In many cases, there is only a single
store because there is no boolean-valued expression to split on or
there was an expression, but it yielded no useful dataflow
information.  However, any CFG edge may potentially have a predecessor
with two stores and a successor with two stores.

We could simply propagate information from both predecessor stores to
both successor stores, but that would throw away useful information,
so we define five flow rules that allow more precise propagation.

\begin{itemize}
    \item \code{EACH_TO_EACH} is the default rule for an edge with a
      predecessor that is not a \code{ConditionalBlock}.  It
      propagates information from the then store of the predecessor to
      the then store of the successor and from the else store of the
      predecessor to the else store of the successor.
    \item \code{THEN_TO_BOTH} is the default rule for an edge from a
      \code{ConditionalBlock} to its then successor.  It propagates
      information from the then store of its predecessor to both the
      then and else stores of the then successor, thereby splitting
      the conditional store to take advantage of the fact that the
      condition is known to be true.
    \item \code{ELSE_TO_BOTH} is the default rule for an edge from a
      \code{ConditionalBlock} to its else successor.  It propagates
      information from the else store of its predecessor to both the
      then and else stores of the else successor, thereby splitting
      the conditional store to take advantage of the fact that the
      condition is known to be false.
    \item \code{THEN_TO_THEN} is a special rule for a short-circuit edge
      from a \code{ConditionalBlock} to its then successor.  It only
      propagates information from the then store of its predecessor to
      the then store of its successor.  This flow rule is used for
      conditional or expressions because the else store of \code{a ||
        b} is not influenced by the then store of \code{a}.
    \item \code{ELSE_TO_ELSE} is a special rule for a short-circuit edge
      from a \code{ConditionalBlock} to its else successor.  It only
      propagates information from the else store of its predecessor to
      the else store of its successor.  This flow rule is used for
      conditional and expressions because the then store of \code{a &&
        b} is not influenced by the else store of \code{a}.
\end{itemize}

Note that the more precise flow rules \code{THEN_TO_THEN} and
\code{ELSE_TO_ELSE} improve the precision of the store they do not
write to.  In other words, \code{THEN_TO_THEN} yields a more precise
else store of its successor by not propagating information to the else
store which might conflict with information already there, and
conversely for \code{ELSE_TO_ELSE}.
See \autoref{sec:cond-exp} for more details and an example.

Currently, we only use flow rules for short-circuiting edges of
conditional ands and ors.  The CFG builder sets the flow rule of each
short-circuiting edge as it builds the CFG for the conditional and/or
expression.

The dataflow analysis logic requires that both the then and the else
store of each block contain some information before the block is
analyzed, so it is a requirement that at least one predecessor block
writes the then store and at least one writes the else store.


\subsection{Concurrency}

By default, the Dataflow Framework analyzes the code as if it is
executed sequentially.  This is unsound if the code is run
concurrently.  Use the \code{-AconcurrentSemantics} command-line
option to enable concurrent semantics.

In the concurrent mode, the dataflow analysis cannot infer any
local information for fields.  This is because after a local update,
another thread might change the field's value before the next use.

An exception to this are monotonic type properties, such as the
\code{@MonotonicNonNull} annotation of the nullness type system.  The
meta-annotation \code{@MonotonicQualifier} declares that a qualifier
behaves monotonically, however it is not yet used to preserve dataflow
information about fields under concurrent semantics.


\section{Example: Constant Propagation}

As an example, consider a constant propagation
analysis for local variables and integer values.
To run constant propagation and create a PDF of the CFG, run
\code{java org.checkerframework.dataflow.cfg.playground.ConstantPropagationPdf MyFile.java}.

\textbf{Abstract values.} A class \code{Constant} is used as an
abstract value, which can either be \emph{top} (more than one integer
value seen), \emph{bottom} (no value seen yet), or \emph{constant}
(exactly one value seen; in which case the value is also stored).

\textbf{The store.} The store maps \code{Node}s to \code{Constant},
where only \code{LocalVariableNode}s and \code{IntegerLiteralNode}s
are used as keys.  Only those two nodes actually are of interest
(there is no addition/multiplication/etc. yet, and other constructs
like fields are not yet supported by the analysis).

Two different instances of \code{LocalVariableNode} can be uses of the
same local variable, and thus the \code{equals} method has been
implemented accordingly. Therefore, every local variable occurs at
most once in the store, even if multiple (equal)
\code{LocalVariableNode}s for it exist.

\textbf{The transfer function.} The transfer function is very
simple. The initial store contains \emph{top} for all parameters, as
any value could have been passed in.  When an integer literal is
encountered, the store is extended to indicate what abstract value
this literal stands for. Furthermore, for an assignment, if the
left-hand side is a local variable, the transfer function updates its
abstract value in the store with the abstract value of the right-hand
side (which can be looked up in the store).

To illustrate how we can have different information in the then and
else block of a conditional, I also implemented another transfer
function that considers the \code{EqualToNode}, and if it is of the
form \code{a == e} for a local variable \code{a} and constant
\code{e}, passes the correct information to one of the branches. This
is also shown in \autoref{fig:ConstSimple}.

\text{Example.} A small example is shown in \autoref{fig:ConstSimple}.

\flow{ConstSimple}{.45}{1}{Simple sequential program to illustrate constant
  propagation.  Intermediate analysis results are shown.}


\section{Example: Live Variable}
\label{sec:live_variable}

A live variable analysis for local variables and fields was implemented
to show the backward analysis works properly.
To run live varibale analysis and create a PDF of the CFG, run
\code{java org.checkerframework.dataflow.cfg.playground.LiveVariablePdf MyFile.java}.

\textbf{Abstract values.} A class \code{LiveVarValue} is a live
variable (which is represented by a node) wrapper turning node into
abstract value. A node can be \code{LocalVariableNode} or \code{FieldAccessNode}.

\textbf{The store.} The live variable store \code{LiveVarStore} has a field
\code{liveVarValueSet} which contains a set of \code{LiveVarValue}s. Only
\code{LocalVariableNode} or \code{FieldAccessNode} will be considered as a
live variable and added to \code{liveVarValueSet}. The store defines methods
\code{putLiveVar(LiveVarValue)} and \code{killLiveVar(LiveVarValue)} to add
and kill live variables.

\textbf{The transfer function.} The transfer function \code{LiveVarTransfer}
initializes empty stores at normal and exceptional exit blocks (because this
is a backward transfer function). The transfer function visits assignments to
update the live variable values in the stores.

\textbf{Example.} An example is shown in \autoref{fig:LiveSimple}.

\flow{LiveSimple}{.33}{1}{Simple sequential program to illustrate live variable.  Intermediate analysis results are shown.}

\section{Example: Very Busy Expression Analysis}

An expression is ``very busy'' if no matter what path is taken,
the expression is used before any variables occurring in it are redefined.
This analysis is a backward analysis that intersects abstract states from successors.
\\
To run very busy analysis and create a PDF of the CFG, run \\
\code{java org.checkerframework.dataflow.cfg.playground.BusyExpressionPdf MyFile.java}

\textbf{Abstract values.} The \code{BusyExprValue} is a very busy expression, represented by a node.
The node can be any \code{BinaryOperationNode}, such as \code{NumericalAdditionNode}
or \code{LeftShiftNode}.

\textbf{The store.} The busy expression store \code{BusyExprStore} has a field
\code{busyExprValueSet} which contains a set of \code{BusyExprValue}s.
If a node is a \code{BinaryOperationNode}, \code{addUseInExpression(Node)}
will recursively analyze the subexpressions of the node to determine
if they are nested \code{BinaryOperationNode}s. It then uses \code{putBusyExpr(BusyExprValue)}
to put the expression in the set. \code{killBusyExpr(Node)} is used when a variable is re-assigned.

\textbf{The transfer function.} The transfer function \code{BusyExprTransfer}
initializes empty stores at normal and exceptional exit blocks (because this
is a backward transfer function). The transfer function visits assignment, method invocation,
and return statement nodes to update the busy expression values in the stores.

\textbf{Example.} An example is shown in \autoref{fig:BusyExprSimple}.

\flow{BusyExprSimple}{.33}{1}{Very Busy Expression Analysis illustration. Intermediate analysis results are shown.}

\section{Example: Reaching Definition Analysis}

The reaching definitions for a given program point are those assignments that
may have updated the current values of variables.
This reaching definition analysis is a standard example of a forward analysis.

\textbf{Abstract values.} A class \code{ReachingDefinitionNode} is used as an
abstract value, which can only wrap \code{AssignmentNode}. The reaching definition analysis
processes such values in the store.

\textbf{The store.} The reaching definition store \code{ReachingDefinitionStore} has a field
\code{reachingDefSet} which contains a set of \code{ReachingDefinitionNode}s. The store defines methods
\code{putDef(ReachingDefinitionValue)} and \code{killDef(Node)} to add
and kill reaching definitions.

\textbf{The transfer function.} The transfer function visits \code{AssignmentNode} to
update the information of reaching definitions in the stores. A reaching definition will
be killed in the store when its left-hand side is same as that of the new generated value, and the new
generated value will be added into the store.

\textbf{Example.} An example is shown in \autoref{fig:ReachSimple}.

\flow{ReachSimple}{.33}{1}{Simple sequential program to illustrate reaching definitions.  Intermediate analysis results are shown.}


\section{Default Analysis}


\subsection{Overview}

The default flow-sensitive analysis \code{org.checkerframework.framework.flow.CFAnalysis}
works for any checker defined in the
Checker Framework.  This generality is both a strength and a weakness
because the default analysis can always run but the facts it can
deduce are limited.  The default analysis is extensible so checkers
can add logic specific to their own qualifiers.

The default flow-sensitive analysis takes advantage of several forms
of control-flow to improve the precision of type qualifiers.  It
tracks assignments to flow expressions, propagating type qualifiers
from the right-hand side of the assignment.  It considers equality and
inequality tests to propagate the most precise qualifiers from the
left or right-hand side to the true (resp. false) successor.  It also
applies type qualifiers from method postconditions after calls.

% \begin{workinprogress}
% Preconditions are not mentioned at all in this manual. How are they handled?
% \end{workinprogress}


\subsection{Interaction of the Checker Framework and the Dataflow Analysis}
\label{sec:flow-cf-interaction}

This section describes how the dataflow analysis is integrated into the
Checker Framework to enable flow-sensitive type checking.

A main purpose of a type factory is to create an \code{AnnotatedTypeMirror}
based on an input tree node. Using the results of the dataflow analysis,
the type factory can return a more refined type than otherwise possible.

Type factories that extend from \code{GeneralAnnotatedTypeFactory}
and set the constructor parameter \code{useFlow} to true will automatically
run dataflow analysis and use the result of the analysis when creating an
\code{AnnotatedTypeMirror}.  The first time that a \code{GenericAnnotatedTypeFactory}
instance visits a \code{ClassTree}, the type factory runs the dataflow analysis
on all the field initializers of the class first, then the bodies of methods in
the class, and then finally the dataflow analysis is
run recursively on the members of nested classes. The result of
dataflow analysis is stored in the \code{GenericAnnotatedTypeFactory} instance.

When creating an \code{AnnotatedTypeMirror} for a tree node, the type factory
queries the result of the dataflow analysis to determine if a more refined
type for the tree node was inferred by the analysis. This is the first
type of query described in \autoref{sec:answering-questions}.

Dataflow itself uses the type factory to get the initial
\code{AnnotatedTypeMirror} for a tree node in the following way.

For a given node \code{n}
\begin{itemize}
\item
    If it has no corresponding AST tree node, use ``top'' as its
    abstract value.
\item
    If it has a corresponding AST tree node, ask the
    \code{AnnotatedTypeFactory} about the type of the tree node.  The
    factory will then use its checker-dependent logic to compute this
    type.  A typical implementation will look at the type of sub-trees
    and compute the overall type based on the information about these
    sub-trees.

    Note that the factory recursively uses information provided by the
    flow analysis to determine the types of sub-trees.  There is a
    check in \code{Analysis.getValue} that the node whose type is
    being requested is a sub-node of the node to which the transfer
    function is currently being applied.  For other nodes, the
    analysis will return \code{null} (i.e., no information) and the
    factory will return the flow-insensitive annotated type.

\end{itemize}

The \code{AnnotatedTypeFactory} and helper classes need to be prepared to work
even when dataflow analysis is not finished yet. Code should either check
whether dataflow analysis is still in progress (using
\code{analysis.isRunning()}) or handle possible null values. The factory should
return conservative, flow-insensitive types if the analysis is still in
progress.


\subsection{The Checker Framework Store and Dealing with Aliasing}

\begin{workinprogress}
    Word of caution: The exact rules of what information is
        retained may or may not be implemented exactly as described
        here.  This is a good starting point in any case, but if very
        precise information is needed, then the source code is very
        readable and well documented.
\end{workinprogress}

The Dataflow Framework provides a default implementation of a store
with the class \code{CFAbstractStore}, which is used (as
\code{CFStore}) as the default store if a checker does not provide its
own implementation.  This implementation of a store tracks the
following information:

\begin{itemize}
    \item Abstract values of local variables.
    \item Abstract values of fields where the receiver is an
          access sequence compose of the following:
    \begin{itemize}
    \item Field access.
    \item Local variable.
    \item Self reference (i.e., \code{this}).
    \item Pure or non-pure method call.
    \end{itemize}
\end{itemize}

The most challenging part is ensuring that the information about field
accesses is kept up to date in the face of incomplete aliasing
information.  In particular, at method calls and assignments care
needs to be taken about which information is still valid afterwards.

The store maintains a mapping from field accesses (as defined in
Section~\ref{sec:field-access}) to abstract values, and the subsequent
sections describe the operations that keep this mapping up-to-date.

A side-effect-free method has no visible side-effects, such
as setting a field of an object that existed before the method was called.
A deterministic method returns the same value (according to ==) every time
it is called with the same arguments and in the same environment.
A pure method is both side-effect-free and deterministic.
The Dataflow Framework respects the
\href{https://checkerframework.org/api/org/checkerframework/dataflow/qual/SideEffectFree.html}{\code{@SideEffectFree}},
\href{https://checkerframework.org/api/org/checkerframework/dataflow/qual/Deterministic.html}{\code{@Deterministic}},and
\href{https://checkerframework.org/api/org/checkerframework/dataflow/qual/Pure.html}{\code{@Pure}}
annotations of the Checker Framework


\subsubsection{Internal Representation of field access}
\label{sec:field-access}

To keep track of the abstract values of fields, the Dataflow Framework
uses its own representation of field accesses (that is different from
the \code{Node} type introduced earlier).  This data type is defined
inductively as follows:

\begin{bnfgrammar}
    \production{\nonterminal{FieldAccess}}
        {\nonterminal{Receiver} \nonterminal{Field} \qquad Java field (identified by its \code{Element})}
    \production{\nonterminal{Receiver}}
        {\nonterminal{SelfReference} \qquad \code{this}}
    \altline{\nonterminal{LocalVariable} \qquad local variable (identified by its \code{Element})}
    \altline{\nonterminal{FieldAccess}}
    \altline{\nonterminal{MethodCall} \qquad Java method call of a method}
    \altline{\nonterminal{Unknown} \qquad any other Java expression}
    \production{\nonterminal{MethodCall}}
        {\nonterminal{Receiver} \literal{.} \nonterminal{Method}
            \literal{(} \nonterminal{Receiver}$^{,*}$ \literal{)}}
\end{bnfgrammar}

\nonterminal{Unknown} is only used to determine which
information needs to be removed (e.g., after an assignment), but no field
access that contains \nonterminal{Unknown} is stored in the mapping to
abstract values.  For instance, \nonterminal{Unknown} could stand for
a non-pure method call, an array access, or a ternary expression.


\subsubsection{Updating Information in the Store}

\newcommand{\alias}{\operatorname{might\_alias}}

In the following, let $o$ be any \nonterminal{Receiver}, $x$ a local
variable, $f$ a field, $m$ a pure method, and $e$ an expression.
Furthermore, we assume to have access to a predicate $\alias(o_1,o_2)$
that returns true if and only if~$o_1$ might alias~$o_2$; see
Section~\ref{sec:alias}.

\subsubsubsection{At Field Assignments}

For a field update of the form $o_1.f_1 = e$, the
dataflow analysis first determines the abstract value $e_\text{val}$ for $e$.
Then, it updates the store $S$ as follows.
\begin{enumerate}
    \item For every field access $o_2.f_1$ that is a key in $S$, remove its
      information if
      $o_2$ lexically contains a \nonterminal{Receiver} that \emph{might}
        alias $o_1.f_1$ as determined by the $\alias$
        predicate.
      Note that the ``lexically contains'' notion for pure method calls
        includes both the receiver as well as the arguments.

      This includes the case where $o_1 = o_2$ (that is, they are
      syntactically the same) and the case where $o_2$ and $o_1.f_1$ might be
      aliases.

    \item $S[o_1.f_1] = e_\text{val}$
\end{enumerate}


\subsubsubsection{At Local Variable Assignments}

For a local variable assignment of the form $x = e$,
the dataflow analysis first determines the abstract value $e_\text{val}$ for
$e$.
Then, it updates the store $S$ as follows.
\begin{enumerate}
    \item For every field access $o_2.f_2$ that is a key in $S$,
      remove its information if $o_2$ lexically contains a
      \nonterminal{Receiver} that \emph{might} alias $x$ as determined
      by the $\alias$ predicate.
    \item $S[x] = e_\text{val}$
\end{enumerate}

\subsubsubsection{At Other Assignments}

For any other assignment $z = e$ where the
assignment target $z$ is not a field or local variable,
the dataflow analysis first determines the abstract value $e_\text{val}$ for
$e$.
Then, it updates the store $S$ as follows.
\begin{enumerate}
    \item For every field access $o_2.f_2$, remove its information if
    $o_2$ lexically contains a \nonterminal{Receiver} that \emph{might}
    alias $z$ as determined by the $\alias$ predicate.
\end{enumerate}


\subsubsubsection{At Non-Pure Method Calls}

A non-pure method call might modify the value of any field arbitrarily.
Therefore, at a method call, any information about fields is lost.


\subsubsubsection{Alias Information}
\label{sec:alias}

The Checker Framework does not include an aliasing analysis, which could
provide precise aliasing information.  For this reason, we implement the
predicate $\alias$ as follows:
\[ \alias(o_1,o_2) :=
\left(\operatorname{type}(o_1) <: \operatorname{type}(o_2)
\;\;\text{or}\;\;
\operatorname{type}(o_2) <: \operatorname{type}(o_1) \right) \]
where $\operatorname{type}(o)$ determines the Java type of a reference $o$
and $<:$ denotes standard Java subtyping.


%  LocalWords: pre cfg javacutils InternalUtils DivideByZero BlockImpl se SingleSuccessorBlock RegularBlock SingleSuccessorBlockImpl ExceptionBlock SpecialBlock ConditionalBlock SpecialBlocks
%  LocalWords: ControlFlowGraph ControlFlowGraphs CFGBuilder ast CFValue JavaExpressions AbstractValue AbstractValues PolyNull Poly AnnotatedTypeMirrors CFAbstractValue NullnessValue CFStore
%  LocalWords: NullnessCheckers CFAbstractStore InitializationStore Regex NullnessStore TransferInput TransferResult TransferInputs ConditionalTransferResult RegularTransferResult CFTransfer
%  LocalWords: TransferFunction NodeVisitor TransferResults isRegex lst CFAbstractTransfer AbstractNodeVisitor asRegex ClassTree InitializationTransfer visitMethodInvocation RegexTransfer
%  LocalWords: CFAbstractAnalysis NullnessTransfer RegexChecker's lp AnalysisResult AnnotatedTypeFactory AnnotatedTypeFactorys AnnotationProvider getAnnotatedType BaseTypeChecker
%  LocalWords: GenericAnnotatedTypeFactory FlowAnalysis CFAnalysis BaseAnnotatedTypeFactory CFGSimple ValueLiteral txt BooleanLiteral CharacterLiteral DoubleLiteral FloatLiteral
%  LocalWords: IntegerLiteral LongLiteral NullLiteral ShortLiteral args StringLiteral Accessor ArrayAccess FieldAccess This MethodAccess ExplicitThis ImplicitThis NullAway
%  LocalWords: LocalVariable MethodInvocation BitwiseAnd BitwiseOr MyType BitwiseComplement BitwiseXor ConditionalAnd ConditionalNot ConditionalOr FloatingDivision FloatingRemainder LeftShift
%  LocalWords: IntegerDivision IntegerRemainder NumericalAddition EqualTo NumericalMinus NumericalMultiplication NumericalPlus util NumericalSubtraction SignedRightShift StringConcatenate
%  LocalWords: TernaryExpression UnsignedRightShift NotEqual GreaterThan GreaterThanOrEqual LessThan ArrayCreation ObjectCreation TypeCast cond
%  LocalWords: InstanceOf NarrowingConversion StringConversion WideningConversion ArrayType ParameterizedType ClassName PrimitiveType PackageName AssertionError NullChk accessor
%  LocalWords: VariableDeclaration FieldAssignment CFGFieldAssignment getRegularStore CFGConditionalOr fnc CFGConditionalOr2 ConditionalOrNode ConditionalOr2 valueOf
%  LocalWords: IdentifierTree FieldAccessNode ImplicitThisNode unboxing TreePath pathHack MethodTree VarSymbol DetachedVarSymbols DeclarationFromElement BoxedClass treeLookup getValue
%  LocalWords: nodeValues AdditionNode postconversion treelookup LocalVariableNode AconcurrentSemantics MonotonicNonNull MonotonicQualifier IntegerLiteralNode EqualToNode unshaded
%  LocalWords: ConstSimple SelfReference PureMethodCall javacutil stubparser TreeParser TreeBuilder DetachedVarSymbol LiveSimple
% LocalWords:  Uber's Nullsafe CFGVisualizeLauncher intra boolean subparts ValueLiteralNode c' lang BooleanLiteralNode CharacterLiteralNode DoubleLiteralNode FloatLiteralNode ThisNode
% LocalWords:  LongLiteralNode NullLiteralNode ShortLiteralNode StringLiteralNode ArrayAccessNode MethodAccessNode ExplicitThisNode SuperNode MethodInvocationNode BinaryOperationNode
% LocalWords:  UnaryOperationNode BitwiseAndNode BitwiseComplementNode BitwiseOrNode BitwiseXorNode ConditionalAndNode ConditionalNotNode FloatingDivisionNode FloatingRemainderNode b1
% LocalWords:  LeftShiftNode IntegerDivisionNode IntegerRemainderNode NumericalAdditionNode b2 b3 NumericalMinusNode NumericalMultiplicationNode NumericalPlusNode NotEqualNode num0
% LocalWords:  NumericalSubtractionNode SignedRightShiftNode StringConcatenateNode GreaterThanNode SwitchExpressionNode TernaryExpressionNode UnsignedRightShiftNode LessThanNode
% LocalWords:  GreaterThanOrEqualNode LessThanOrEqualNode CaseNode AssignmentNode ArrayCreationNode StringConcatenateAssignmentNode ObjectCreationNode FunctionalInterfaceNode ThrowNode
% LocalWords:  TypeCastNode InstanceOfNode instanceof NarrowingConversionNode StringConversionNode WideningConversionNode ArrayTypeNode ParameterizedTypeNode PrimitiveTypeNode Varargs
% LocalWords:  ClassNameNode PackageNameNode ReturnNode AssertionErrorNode SynchronizedNode varargs MarkerNode NullChkNode VariableDeclarationNode ClassDeclarationNode NoSuchFieldError
% LocalWords:  LambdaResultExpressionNode ArrayIndexOutOfBoundsException Throwable ClassFormatError ArithmeticException NegativeArraySizeException ClassCircularityError CFGIfStatement
% LocalWords:  NoClassDefFoundError unaryAssignNodeLookup assertionsEnabled CFGAssert CFGSwitch
% LocalWords:  intValue
